22 research outputs found

    Optymalizacja zapytań w środowisku heterogenicznym CPU/GPU dla baz danych szeregów czasowych

    Get PDF
    In recent years, processing and exploration of time series has experienced a noticeable interest. Growing volumes of data and needs of efficient processing pushed the research in new directions, including hardware based solutions. Graphics Processing Units (GPU) have significantly more applications than just rendering images. They are also used in general purpose computing to solve problems that can benefit from massive parallel processing. There are numerous reports confirming the effectiveness of GPU in science and industrial applications. However, there are several issues related with GPU usage as a databases coprocessor that must be considered. First, all computations on the GPU are preceded by time consuming memory transfers. In this thesis we present a study on lossless lightweight compression algorithms in the context of GPU computations and time series database systems. We discuss the algorithms, their application and implementation details on GPU. We analyse their influence on the data processing efficiency, taking into account both the data transfer time and decompression time. Moreover, we propose a data adaptive compression planner based on those algorithms, which uses hierarchy of multiple compression algorithms in order to further reduce the data size. Secondly, there are tasks that either hardly suit GPU or fit GPU only partially. This may be related to the size or type of the task. We elaborate on heterogeneous CPU/GPU computation environment and optimization method that seeks equilibrium between these two computation platforms. This method is based on heuristic search for bi-objective optimal execution plans. The underlying model mimics the commodity market, where devices are producers and queries are consumers. The value of resources of computing devices is controlled by supply-and-demand laws. Our model of the optimization criteria allows finding solutions for heterogeneous query processing problems where existing methods have been ineffective. Furthermore, it also offers lower time complexity and higher accuracy than other methods. The dissertation also discusses an exemplary application of time series databases: the analysis of zebra mussel (Dreissena polymorpha) behaviour based on observations of the change of the gap between the valves, collected as a time series. We propose a new algorithm based on wavelets and kernel methods that detects relevant events in the collected data. This algorithm allows us to extract elementary behaviour events from the observations. Moreover, we propose an efficient framework for automatic classification to separate the control and stressful conditions. Since zebra mussels are well-known bioindicators this is an important step towards the creation of an advanced environmental biomonitoring system.W ostatnich latach przetwarzanie i badanie szeregów czasowych zyskało spore zainteresowanie. Rosnące ilości danych i potrzeba ich sprawnego przetwarzania nadały nowe kierunki prowadzonym badaniom, które uwzględniają również wykorzystanie rozwiązań sprzętowych. Procesory graficzne (GPU) mają znacznie więcej zastosowań niż tylko wyświetlanie obrazów. Coraz częściej są wykorzystywane przy rozwiązywaniu problemów obliczeniowych ogólnego zastosowania, które mogą spożytkować możliwości przetwarzania masywnie równoległego. Wiele źródeł potwierdza skuteczność GPU zarówno w nauce, jak i w zastosowaniach w przemyśle. Jest jednak kilka kwestii związanych z użyciem GPU jako koprocesora w bazach danych, które trzeba mieć na uwadze. Po pierwsze, wszystkie obliczenia na GPU są poprzedzone czasochłonnym transferem danych. W pracy zaprezentowano rezultaty badań dotyczących lekkich i bezstratnych algorytmów kompresji w kontekście obliczeń GPU i systemów baz danych dla szeregów czasowych. Omówione zostały algorytmy, ich zastosowanie oraz szczegóły implementacyjne na GPU. Rozważono wpływ algorytmów na wydajność przetwarzania danych z uwzględnieniem czasu transferu i dekompresji danych. Ponadto, zaproponowany został adaptacyjny planer kompresji danych, który wykorzystuje różne algorytmy lekkiej kompresji w celu dalszego zmniejszenia rozmiaru skompresowanych danych. Kolejnym problemem są zadania, które źle (lub tylko częściowo) wpisują się w architekturę GPU. Może być to związane z rozmiarem lub rodzajem zadania. W pracy zaproponowany został model heterogenicznych obliczeń na CPU/GPU. Przedstawiono metody optymalizacji, poszukujące równowagi między różnymi platformami obliczeniowymi. Opierają się one na heurystycznym poszukiwaniu planów wykonania uwzględniających wiele celów optymalizacyjnych. Model leżący u podstaw tego podejścia naśladuje rynki towarowe, gdzie urządzenia są traktowane jako producenci, konsumentami są natomiast plany zapytań. Wartość zasobów urządzeń komputerowych jest kontrolowana przez prawa popytu i podaży. Zastosowanie różnych kryteriów optymalizacji pozwala rozwiązać problemy z zakresu heterogenicznego przetwarzania zapytań, dla których dotychczasowe metody były nieskuteczne. Ponadto proponowane rozwiązania wyróżnia mniejsza złożoność czasowa i lepsza dokładność. W rozprawie omówiono przykładowe zastosowanie baz danych szeregów czasowych: analizę zachowań racicznicy zmiennej (Dreissena polymorpha) opartą na obserwacji rozchyleń muszli zapisanej w postaci szeregów czasowych. Proponowany jest nowy algorytm oparty na falkach i funkcjach jądrowych (ang. kernel functions), który wykrywa odpowiednie zdarzenia w zebranych danych. Algorytm ten pozwala wyodrębnić zdarzenia elementarne z zapisanych obserwacji. Ponadto proponowany jest zarys systemu do automatycznego oddzielenia pomiarów kontrolnych i tych dokonanych w stresujących warunkach. Jako że małże z gatunku Dreissena polymorpha są znanymi wskaźnikami biologicznymi, jest to istotny krok w kierunku biologicznych systemów wczesnego ostrzegania

    Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

    Get PDF
    The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach

    Advancing microbiome research with machine learning : key findings from the ML4Microbiome COST action

    Get PDF
    The rapid development of machine learning (ML) techniques has opened up the data-dense field of microbiome research for novel therapeutic, diagnostic, and prognostic applications targeting a wide range of disorders, which could substantially improve healthcare practices in the era of precision medicine. However, several challenges must be addressed to exploit the benefits of ML in this field fully. In particular, there is a need to establish "gold standard" protocols for conducting ML analysis experiments and improve interactions between microbiome researchers and ML experts. The Machine Learning Techniques in Human Microbiome Studies (ML4Microbiome) COST Action CA18131 is a European network established in 2019 to promote collaboration between discovery-oriented microbiome researchers and data-driven ML experts to optimize and standardize ML approaches for microbiome analysis. This perspective paper presents the key achievements of ML4Microbiome, which include identifying predictive and discriminatory 'omics' features, improving repeatability and comparability, developing automation procedures, and defining priority areas for the novel development of ML methods targeting the microbiome. The insights gained from ML4Microbiome will help to maximize the potential of ML in microbiome research and pave the way for new and improved healthcare practices

    Contemporary Challenges and Solutions

    Get PDF
    CA18131 CP16/00163 NIS-3317 NIS-3318 decision 295741 C18/BM/12585940The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 “ML4Microbiome” that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.publishersversionpublishe

    Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

    Get PDF
    The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach

    Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions

    Get PDF
    The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 "ML4Microbiome" that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies

    Compression Planner for Time Series Database with GPU Support

    No full text
    Abstract. Nowadays, we can observe increasing interest in processing and exploration of time series. Growing volumes of data and needs of efficient processing pushed research in new directions. This paper presents a lossless lightweight compression planner intended to be used in a time series database system. We propose a novel compression method which is ultra fast and tries to find the best possible compression ratio by composing several lightweight algorithms tuned dynamically for incoming data. The preliminary results are promising and open new horizons for data intensive monitoring and analytic systems

    Improving Multivariate Time Series Forecasting with Random Walks with Restarts on Causality Graphs

    No full text
    International audienc

    Finding relevant multivariate models for multi-plant photovoltaic energy forecasting

    No full text
    International audienceForecasting the photovoltaic energy power is useful for optimizing and controling the system. It aims to predict the power production based on internal and external variables. This problem is very similar to the one of multiple time series forecasting problem. With the presence of multiple predictor variables, not all of them will equally contribute to the prediction. The goal is, given a set of predictors, to find what is the best / most accurate subset (s) leading to the best forecast. In this work, we present a feature selection and model matching framework. The idea is that we try to find the optimal combination of forecasting model with the most relevant features for given variable. We use a variety of causality based selection approaches and dimension reduction techniques. The experiments are conducted on real data and the results advocate the usefulness of the proposed approach
    corecore